Domain-Specific Knowledge Acquisition from Text

نویسندگان

  • Dan I. Moldovan
  • Roxana Girju
  • Vasile Rus
چکیده

In many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases. This paper presents a methodology for discovering domain-specific concepts and relationships in an attempt to extend WordNet. The method was tested on five seed concepts selected from the financial domain: interest rate, stock market, inflation, economic growth, and employment. 1 D e s i d e r a t a f o r A u t o m a t e d K n o w l e d g e A c q u i s i t i o n The need for knowledge The knowledge is infinite and no matter how large a knowledge base is, it is not possible to store all the concepts and procedures for all domains. Even if that were possible, the knowledge is generative and there are no guarantees that a system will have the latest information all the time. And yet, if we are to build common-sense knowledge processing systems in the future, it is necessary to have general-purpose and domain-specific knowledge that is up to date. Our inability to build large knowledge bases without much effort has impeded many ANLP developments. The most successful current Information Extraction systems rely on hand coded linguistic rules representing lexico-syntactic patterns capable of matching natural language expressions of events. Since the rules are hand-coded it is difficult to port systems across domains. Question answering, inference, summarization, and other applications can benefit from large linguistic knowledge bases. The basic idea A possible solution to the problem of rapid development of flexible knowledge bases is to design an automatic knowledge acquisition system that extracts knowledge from texts for the purpose of merging it with a core ontological knowledge base. The attempt to create a knowledge base manually is time consuming and error prone, even for small application domains, and we believe that automatic knowledge acquisition and classification is the only viable solution to large-scale, knowledge intensive applications. This paper presents an interactive method that acquires new concepts and connections associated with user-selected seed concepts, and adds them to the WordNet linguistic knowledge structure (Fellbaum 1998). The sources of the new knowledge are texts acquired from the Internet or other corpora. At the present time, our system works in a semi-automatic mode, in the sense that it acquires concepts and relations automatically, but their validation is done by the user. We believe that domain knowledge should not be acquired in a vacuum; it should expand an existent ontology with a skeletal structure built on consistent and acceptable principles. The method presented in this paper is applicable to any Machine Readable Dictionary. However, we chose WordNet because it is freely available and widely used. R e l a t e d w o r k This work was inspired in part by Marti Hearst's paper (Hearst 1998) where she discovers manually lexico-syntactic patterns for the HYPERNYMY relation in WordNet. Much of the work in pattern extraction from texts was done for improving the performance of Information Extraction systems. Research in this area was done by (Kim and Moldovan 1995) (Riloff 1996), (Soderland 1997) and others. The MindNet (Richardson 1998) project at Microsoft is an attempt to transform the Longman Dictionary of Contemporary English (LDOCE) into a form of knowledge base for text processing. Woods studied knowledge representation and classification for long time (Woods 1991), and more recently is trying to automate the construction of taxonomies by extracting concepts directly from texts (Woods 1997). The Knowledge Acquisition from Text (KAT) system is presented next. It consists of four parts: (1) discovery of new concepts, (2) discovery of new lexical patterns, (3) discovery of new relationships reflected by the lexical patterns, and (4) the classification and integration of the knowledge discovered with a WordNet like knowledge base.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Domain-Specific Knowledge Acquisition and Classification Using WordNet

For many knowledge intensive applications, it is necessary to have extensive domain-specific knowledge in addition to general-purpose knowledge bases usually built around Machine Readable Dictionaries. This paper presents a methodology for acquiring domain specific knowledge from text and classifying the concepts learned into an ontology that extends WordNet. The method was tested for three see...

متن کامل

Supporting Software Language Engineering by Automated Domain Knowledge Acquisition

In model-driven engineering, domain-specific languages (DSLs) play an important role in providing well-defined environments for modeling different aspects of a system. Detailed knowledge of the application domain as well as expertise in language engineering is required to create new languages. This research work proposes automated knowledge acquisition to support language engineers in early lan...

متن کامل

Mining Ontologies from Text

Ontologies have become an important means for structuring knowledge and building knowledge-intensive systems. For this purpose, efforts have been made to facilitate the ontology engineering process, in particular the acquisition of ontologies from domain texts. We present a general architecture for discovering conceptual structures and engineering ontologies. Based on our generic architecture w...

متن کامل

Knowledge-based Knowledge Elicitation

A method for using the advantages of domain-specific knowledge acquisition for a general purpose knowledge acquisition tool is introduced. To adapt the knowledge acquisition tool for a specific application and a specific problem solving strategy (e.g. heuristic classification, such diagnostic strategies as establish and refine), acquisition knowledge bases (AKBs) are integrated in the system to...

متن کامل

Knowledge-Based Labeling of Semantic Relationships in English

An increasing number of NLP tasks require semantic labels to be assigned, not only to entities that appear in textual elements, but to the relationships between those entities. Interest is growing in shallow semantic role labeling as well as in deep semantic distance metrics grounded in ontologies, as each of these contributes to better understanding and organization of text. In this work I app...

متن کامل

Randomizing the Knowledge Acquisition Bottleneck

This paper addresses the knowledge acquisition bottleneck. It first takes an information-theoretic view of knowledge acquisition as having a basis in randomization theory and subsequently gives practical examples. The examples are taken from the field of expert compiler technology. Such technology can be used to effect the realization of fourth generation languages. These languages have been sh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000